Understanding Network Files - Multifacet GEMS Documentation Wiki

Understanding Network Files and Network Configuration

This page describes how to interpret the .txt files in the ruby/network/simple/Network_Files directories.

1. All the NUCA_* network files describe a non-uniform cache access architecture network. Please read [www.cs.utexas.edu/ftp/pub/dburger/papers/ieee_micro04_nuca.pdf] for the description of what this means. All tne NUCA_* files are manually generated and should NOT be used as an example.

2. GarnetFileMaker.py generates a mesh of tiles. To run GarnetFileMaker.py, please type in the following:

./GarnetFileMaker.py num_rows and num_columns For example, ./GarnetFileMaker.py 4 4 will generate a 4x4 mesh of tiles and ./GarnetFileMaker.py 8x4 will generate an 8x4 mesh of tiles. Consider a 4x4 mesh of tiles generated by GarnetFileMaker.py:
ext_node:L1Cache:0 int_node:0 link_latency:1
ext_node:L2Cache:0 int_node:0 link_latency:1
ext_node:L1Cache:1 int_node:1 link_latency:1
ext_node:L2Cache:1 int_node:1 link_latency:1
ext_node:L1Cache:2 int_node:2 link_latency:1
ext_node:L2Cache:2 int_node:2 link_latency:1
ext_node:L1Cache:3 int_node:3 link_latency:1
ext_node:L2Cache:3 int_node:3 link_latency:1
ext_node:L1Cache:4 int_node:4 link_latency:1
ext_node:L2Cache:4 int_node:4 link_latency:1
ext_node:L1Cache:5 int_node:5 link_latency:1
ext_node:L2Cache:5 int_node:5 link_latency:1
ext_node:L1Cache:6 int_node:6 link_latency:1
ext_node:L2Cache:6 int_node:6 link_latency:1
ext_node:L1Cache:7 int_node:7 link_latency:1
ext_node:L2Cache:7 int_node:7 link_latency:1
ext_node:L1Cache:8 int_node:8 link_latency:1
ext_node:L2Cache:8 int_node:8 link_latency:1
ext_node:L1Cache:9 int_node:9 link_latency:1
ext_node:L2Cache:9 int_node:9 link_latency:1
ext_node:L1Cache:10 int_node:10 link_latency:1
ext_node:L2Cache:10 int_node:10 link_latency:1
ext_node:L1Cache:11 int_node:11 link_latency:1
ext_node:L2Cache:11 int_node:11 link_latency:1
ext_node:L1Cache:12 int_node:12 link_latency:1
ext_node:L2Cache:12 int_node:12 link_latency:1
ext_node:L1Cache:13 int_node:13 link_latency:1
ext_node:L2Cache:13 int_node:13 link_latency:1
ext_node:L1Cache:14 int_node:14 link_latency:1
ext_node:L2Cache:14 int_node:14 link_latency:1
ext_node:L1Cache:15 int_node:15 link_latency:1
ext_node:L2Cache:15 int_node:15 link_latency:1
int_node:0 int_node:1 link_latency:1 link_weight:1
int_node:1 int_node:2 link_latency:1 link_weight:1
int_node:2 int_node:3 link_latency:1 link_weight:1
int_node:4 int_node:5 link_latency:1 link_weight:1
int_node:5 int_node:6 link_latency:1 link_weight:1
int_node:6 int_node:7 link_latency:1 link_weight:1
int_node:8 int_node:9 link_latency:1 link_weight:1
int_node:9 int_node:10 link_latency:1 link_weight:1
int_node:10 int_node:11 link_latency:1 link_weight:1
int_node:12 int_node:13 link_latency:1 link_weight:1
int_node:13 int_node:14 link_latency:1 link_weight:1
int_node:14 int_node:15 link_latency:1 link_weight:1
int_node:0 int_node:4 link_latency:1 link_weight:2
int_node:4 int_node:8 link_latency:1 link_weight:2
int_node:8 int_node:12 link_latency:1 link_weight:2
int_node:1 int_node:5 link_latency:1 link_weight:2
int_node:5 int_node:9 link_latency:1 link_weight:2
int_node:9 int_node:13 link_latency:1 link_weight:2
int_node:2 int_node:6 link_latency:1 link_weight:2
int_node:6 int_node:10 link_latency:1 link_weight:2
int_node:10 int_node:14 link_latency:1 link_weight:2
int_node:3 int_node:7 link_latency:1 link_weight:2
int_node:7 int_node:11 link_latency:1 link_weight:2
int_node:11 int_node:15 link_latency:1 link_weight:2
What does all of this mean?
Each int_node specified a location in the network. You can consider it a socket.
The line: ext_node:L1Cache:0 int_node:0 link_latency:1 declares an L1Cache Controller to be used in the coherence protocol and binds it to int_node 0. The line: ext_node:L2Cache:0 int_node:0 link_latency:1 declares an L2Cache Controller to be used in the coherence protocol and also binds it to int_node 0. Essentially, in this fashion a tile is made because L1 and L2 are co-located in the network. Whether L2Cache Controllers are controlling private L2's or shared banked L2 depends on the protocol. MESI_CMP protocol treats L2 as shared banked cache. There must be as many L1Cache ext_node declarations as there are processors. There can be arbitrary number of L2Cache declarations depending on what you are implementing. The banking of L2 cache is determined by mapping functions in the L1Cache Controllers. Those functions can be changed to do whatever mapping is desired. L2 Caches can be made private by mapping the accesses to a private L2 bank or interleaving the addresses across L2 banks.
The line: int_node:3 int_node:7 link_latency:1 link_weight:2 connects tile 3 to tile 7. All these lines together describe a mesh. So, after L1's and L2's are declared and bound to the same tile, the tiles are interconnected as a mesh. You can further make a mesh of meshes to make a multi-chip network. To do that you need to decide where the inter-chip links will be located as well as determine the inter-chip link and bandwidth. You have two options for interconnecting multiple chips: overload some processors/tiles (center or corners) and let them act as ingress/egress ports. However, this will make the on-chip traffic non-uniform. Alternatively, you could modify the simulator to be able to declare another type of node, which would be InterChipLink and declare int_nodes for those. This would guarantee uniform traffic as no processor handles more traffic than another. Instead, there is a dedicated inter-chip router. You also do not have to be constrained to mesh, it should be easy to interconnect the tiles in a torus, crossbar, point-to-point or any other topology. You also have the option of specify a dance-hall CMP instead of a tiled CMP. To do that, you need to reserve a network spot for L1's and L2's separately and then interconnect them, for example, here's a 8-processor + 4 L2 network:
ext_node:L1Cache:0 int_node:0 link_latency:1
ext_node:L1Cache:1 int_node:1 link_latency:1
ext_node:L1Cache:2 int_node:2 link_latency:1
ext_node:L1Cache:3 int_node:3 link_latency:1
ext_node:L1Cache:4 int_node:4 link_latency:1
ext_node:L1Cache:5 int_node:5 link_latency:1
ext_node:L1Cache:6 int_node:6 link_latency:1
ext_node:L1Cache:7 int_node:7 link_latency:1
ext_node:L2Cache:0 int_node:8 link_latency:1
ext_node:L2Cache:1 int_node:9 link_latency:1
ext_node:L2Cache:2 int_node:10 link_latency:1
ext_node:L2Cache:3 int_node:11 link_latency:1
int_node:0 int_node:8 link_latency:1 link_weight:1
int_node:1 int_node:8 link_latency:1 link_weight:1
int_node:2 int_node:8 link_latency:1 link_weight:1
int_node:3 int_node:8 link_latency:1 link_weight:1
int_node:4 int_node:8 link_latency:1 link_weight:1
int_node:5 int_node:8 link_latency:1 link_weight:1
int_node:6 int_node:8 link_latency:1 link_weight:1
int_node:7 int_node:8 link_latency:1 link_weight:1
int_node:0 int_node:9 link_latency:1 link_weight:1
int_node:1 int_node:9 link_latency:1 link_weight:1
int_node:2 int_node:9 link_latency:1 link_weight:1
int_node:3 int_node:9 link_latency:1 link_weight:1
int_node:4 int_node:9 link_latency:1 link_weight:1
int_node:5 int_node:9 link_latency:1 link_weight:1
int_node:6 int_node:9 link_latency:1 link_weight:1
int_node:7 int_node:9 link_latency:1 link_weight:1
int_node:0 int_node:10 link_latency:1 link_weight:1
int_node:1 int_node:10 link_latency:1 link_weight:1
int_node:2 int_node:10 link_latency:1 link_weight:1
int_node:3 int_node:10 link_latency:1 link_weight:1
int_node:4 int_node:10 link_latency:1 link_weight:1
int_node:5 int_node:10 link_latency:1 link_weight:1
int_node:6 int_node:10 link_latency:1 link_weight:1
int_node:7 int_node:10 link_latency:1 link_weight:1
int_node:0 int_node:11 link_latency:1 link_weight:1
int_node:1 int_node:11 link_latency:1 link_weight:1
int_node:2 int_node:11 link_latency:1 link_weight:1
int_node:3 int_node:11 link_latency:1 link_weight:1
int_node:4 int_node:11 link_latency:1 link_weight:1
int_node:5 int_node:11 link_latency:1 link_weight:1
int_node:6 int_node:11 link_latency:1 link_weight:1
int_node:7 int_node:11 link_latency:1 link_weight:1
So, there are 8 Processor, each has an L1. There are also 4 L2's (L2 banks). Each L2 bank is connected to all of the L1's.
Finally, you have to configure Memory Controllers. Memory controllers are declared as follows: ext_node:Directory:x int_node:y link_latency:1 bw_multiplier:64, where x is just the memory controller number, and y is the internal node the memory controller is bound to. As described in "Achieving Predictable Performance through Better Memory Controller Placement in Many-Core CMPs" the best palcement of memory controllers is in a Diamond or Diagonal X. So, ideally, in a tiled mesh, the tiles along the diagonal X or diamond are identified and a memory controller is bound to each of those. If you are implementing a dance-hall style cmp, memory controllers should be bound to the L2 caches.